Obtaining EQ-5D-3L utility index from the health status scale of traditional Chinese medicine (TCM-HSS) based on a mapping study

Background Almost all traditional Chinese medicine (TCM) quality of life measures are non-preference-based measures (non-PBMs), which do not provide utilities for cost-utility analysis in pharmacoeconomic evaluation. Whereas the mapping has become a new instrument to obtain utilities, which builds a bridge between non-PBMs and PBMs. Purpose To develop mapping algorithms from the health status scale of traditional Chinese medicine (TCM-HSS) onto the three-level EuroQol five-dimensional questionnaire (EQ-5D-3L). Methods The cross-sectional data were collected by questionnaire survey from a tertiary hospital visit population and community residents in China, and randomly divided into training and validation set by 2:1. Based on the training set, direct and indirect mapping methods (7 regression methods and 4 model specifications) were conducted to establish alternative models, which were comprehensively evaluated based on the validation set by mean absolute error, root mean square error, and Spearman correlation coefficient between predicted and observed values. Based on the whole sample, the preferred mapping algorithm was developed. Results A total of 639 samples were included, with an average age of 45.24 years and 61.66% of respondents were female. The mean EQ-5D-3L index was 0.9225 [SD = 0.1458], and the mean TCM-HSS index was 3.4144 [SD = 3.1154]. The final mapping algorithm was a two-part regression model including the TCM-HSS subscales, interaction terms, and demographic covariates (age and gender). The prediction performance was good. The mean error was 0.0003, the mean absolute error was 0.0566, the root mean square error was 0.1039, and 83.10% of the prediction errors were within 0.1; the Spearman correlation coefficient between predicted and observed EQ-5D-3L values was 0.6479. Conclusion It is the first study to develop a mapping algorithm between the TCM-HSS and EQ-5D-3L, which demonstrates excellent prediction accuracy and estimates utility value for economic evaluation from TCM quality of life measures. Supplementary Information The online version contains supplementary material available at 10.1186/s12955-022-02076-9.


Introduction
Pharmacoeconomic evaluation (PE), an effective method for rational allocation of health resources, has become an important basis for governments around the world to make health decisions. The "Interim Measures for  20:164 the Administration of Medication for Basic Medical Insurance" [1] published by China had clearly required companies to submit PE-related materials in the adjustment process of the medical insurance drug list. The National Medical Security Administration of China, established in 2018, had carried out four consecutive negotiations for medical insurance drug list. The proportion of Chinese patent medicines in the list is gradually equal to that of Western medicines (49.43%, 49.07% and 48.04% in the past three years). With the continuous improvement of the dynamic adjustment mechanism of Chinese medical insurance catalogue, the economic evaluation of traditional Chinese medicine (TCM) will inevitably become one of the normalized tasks, and high-quality and standardized PE evidence of TCM has become an urgent need.
Cost-utility analysis (CUA) is one of the common methods in PE and the preferred recommendation in international pharmacoeconomic guidelines [2], and the key to CUA is the measurement of health utility (HU). It is obtained indirectly through a preferencebased measure that has established a utility value scoring system, such as the three-level EuroQol five-dimensional questionnaire (EQ-5D-3L), the most widely applied in the world. Nevertheless, traditional Chinese medicine quality of life measures are mostly not based on preference, lacking utility value scoring systems, and cannot measure utility value directly. Yu [3] systematically reviewed the Chinese and English databases and found that there were eight generic TCM quality of life scales, all of which were not based on preference, like the health status scale of traditional Chinese medicine (TCM-HSS), Chinese quality of life instrument (Ch QOL), health scale of traditional Chinese medicine (HSTCM), etc. Wang [4] searched Chinese databases and found that there were 39 TCM disease-specific quality of life scales, all of which were non-utility measures, such as the quality of life scale in patients with chronic hepatitis B (AOL-CHB), quality of life scale for chronic eczema patients-prior test version (EQOLS), etc. Fortunately, the mapping technique or cross-walking are recommended by the UK National Institute of Health and Clinical Excellence (NICE) [5], which provides a new idea for non-preference-based measures (non-PBMs) to obtain utility value.
The mapping method predicts the utility value of non-PBMs by building mapping algorithms between the non-PBMs (also called starting scales,) and preferencebased measures (PBMs, also called target scales). However, no one has applied the mapping method to the field of TCM probably because the differences in cultural background and medical philosophy between Chinese and Western medicine may make it more difficult to make a connection between the Chinese and Western scales, thus, it is essential to conduct the first study as a reference for subsequent studies of this kind. Among the manufactured generic TCM quality of life scales, the TCM-HSS, one of non-PBMs, is the most comprehensive measure in terms of reliability and validity assessment, and the test results showed excellent reliability and validity [6]. The measure was completely available accompanied with a distinct scoring method including item, subscale and total scores provided by the developers at the same time [6]. Apart from that, the TCM-HSS had been more widely used in various studies, for instance, some scholars had used the TCM-HSS in clinical investigations of different diseases so as to assess the responsiveness of TCM-HSS to different disease types, e.g., Zeng [7,8] conducted clinical investigation on patients with four diseases (coronary heart disease, rheumatoid arthritis, chronic obstructive pulmonary disease, chronic renal failure) by TCM-HSS and SF-36, and Liu [9] used the TCM-HSS to assess changes in the health status of patients with functional gastrointestinal disease before and after treatment. Additionally, the TCM-HSS was also applied to evaluate the health status of the general population, e.g., Chen [10] adopted the TCM-HSS as a survey instrument to investigate the quality of life of college students and explored the factors affecting the TCM-HSS scores. Therefore, the aim of the study is to develop a mapping algorithm between TCM-HSS (a starting scale) and EQ-5D-3L (a target scale), so as to break the technical barriers that the TCM quality of life measures cannot obtain the utility value, improve the evidence level in the economic evaluation of traditional Chinese medicine, and promote the high-quality development of the traditional Chinese medicine industry.

Data
The population who visited a tertiary hospital in Jiangsu Province from February to May 2021 were recruited through a combination of online and offline questionnaire survey. Additionally, an online survey of community residents was conducted during the same period. The inclusion criteria were: (1) aged 16 or above, (2) informed consent to the survey, (3) not cognitive impairment and able to complete the questionnaire independently. Participants who did not meet the inclusion criteria were excluded. Participants completed the TCM-HSS and EQ-5D-3L simultaneously. Moreover, we collected demographic and health-related information including age, gender, marital status, education level, smoking, drinking, etc. The total sample consisted of the above two sections and 23 data with serious missing information were eliminated.

EQ-5D-3L
EQ-5D-3L, a generic preference-based measure, was developed by the European Quality of Life Group [11]. It is one of the most widely used quality of life measures in the world. More than a dozen countries have developed utility value scoring systems based on the preference of their native population. The health utility scoring system of EQ-5D-3L was primarily developed by Liu et al. [12] based on the Chinese general population in 2014. The EQ-5D-3L includes 5 dimensions: mobility, self-care, usual activity, pain/discomfort, and anxiety/depression. Each dimension includes 3 levels: no difficulty, some difficulty and extreme difficulty, which can describe 3 5 = 243 different health states.

TCM-HSS
TCM-HSS, a generic non-preference-based measure of TCM, was developed by Professor Liu F B 's team [13] under the guidance of the traditional Chinese medicine theory and widely used in the clinical field of TCM in China. The initial version in 2008, including 8 subscales and 30 items, was revised and improved to form the final version with 33 items in 8 subscales in 2009 [6]. Eight subscales include energy, pain, diet, stool, urination, sleeping, physical and mood, which occupy 32 items, and each subscale consists of 2-8 items. The last item is an overall evaluation of health. Each item is answered with a 4-level rating method, including very good, general, slightly poor, very poor with a score of 0 to 3 points, and the lower score indicates the better quality of life. The score of each subscale of the scale = sum of the scores of the items belong to the subscale/the number of items belong to the subscale, and the total scores of the scale = sum of the scores of all items. We had delivered a table comparing the structural components of the TCM-HSS and EQ-5D-3L, please see Additional file 1: Table S1.

Statistical methods
The correlations were calculated to assess the degree of conceptual overlap between the starting and target scales before the mapping process considering the different development contexts of the two scales, which was to ensure that a link could be established between TCM-HSS and EQ-5D-3L [14]. The Spearman's rank correlation was used to evaluate the association between the two measures, which denoted weak, moderate, or strong correlation for coefficient < 0.35, ≥ 0.35 and ≤ 0.5, or > 0.5 [15].
The mapping process was as follows: first, the whole sample was divided into the training set and validation set by random blocking method with age as a blocking factor according to 2:1; then, seven econometric methods and four model specifications were combined to construct alternative models based on the training set, and we evaluated the alternative models to select a good predictive performance of model specification based on the validation set; finally, based on the whole sample the final mapping algorithm was established in the model specification selected above and checked its predictive performance.
The previous systematic reviews found that the most widely used econometric model in mapping was ordinary least square (OLS), while other methods that consider the characteristics of health utility value distribution were also applied [16,17]. These included Tobit model [18,19] and censored least absolute deviations (CLAD) [20,21] to handle censored or bounded data as the maximum of utilities is 1 as well as beta regression [22] and fractional logistic model [23]. Generalized linear model (GLM) was also used, which offers flexibility in various distribution types [24]. Robust regressions in liner regressions like MM-estimator, M-estimator, etc. were also applied, which are designed to deal with the effect of outliers [25]. Furthermore, approaches which enable greater flexibility such as mixture models, which encompass multiple models have been developed. So do the two-or threepart models (TPM or 3PM), which combine logistic and liner regression models for those who are at and below the full health. Methods like adjusted limited dependent variable mixture model (ALDVMM), developed for EQ-5D data specially, which combined consideration of potential distribution of data and a limit of utility values [26,27], were increasingly applied in mapping studies. Techniques such as multinomial logistic regression (MLR) or ordinal logistic regression model, which are suitable for disordered or ordered multi-classification dependent variables, respectively, had been used for indirect mapping to PBM dimensions or items.
In our study, seven econometric methods were conducted including OLS, Tobit, CLAD, GLM, ALDVMM, TPM and ordinal logistic regression. All methods except the last one, an indirect or response mapping method, are direct mapping methods, which directly predict the EQ-5D-3L score (based on the Chinese tariff [12]). We considered the negative utility value (1 minus EQ-5D-3L utility value) as the dependent variable, and the specified distribution family was gamma when using GLM. ALDVMM was set to 2 components after the model failed to converge when testing with up to three components. The logit regression was adopted for the first part and an aforementioned GLM was executed for the second part in TPM [28], and the expected utility was estimated as Pr (Utility = 1) + U* (1-Pr (Utility = 1)) [U: predicted utility at imperfect health; Pr (Utility = 1): predicted probability at full health]. In response mapping, which predicted the response level of five questions in EQ-5D, and then calculated the EQ-5D-3L utility value according to health utility scoring system, we conducted the ordered logistic regression model after testing because it depends on the parallelism assumption and another choice is multinomial logistic regression if violating the assumption [29,30].
We assessed the predictive performance of alternative models based on the comprehensive consideration of three indicators: (1) mean absolute error (MAE), the smaller, the better; (2) root mean squared error (RMSE), the smaller, the better; (3) Spearman's rank correlation coefficient between the observed and predicted EQ-5D-3L values, the closer is to 1, the better. All statistical analyses were performed in Microsoft Excel and Stata/MP (Stata Corp. LP, College Station, Texas). The manuscript followed the mapping onto preferencebased measures reporting standards (MAPS) checklist [31] (see Additional file 2).

Descriptive summary
639 respondents were included in the analysis eventually, of which 581 were derived from the hospital visit population and 58 from community residents. The average age of respondents was 45.24 years old with 61.66% being female. More than half of the sample were employed (52.9%) and sometimes controlled their diet (52.11%). Most of them were married (71.52%), never smoked (77.46%), never drank (59.62%), and participated in basic medical insurance for employees (63.07%). More specific information is shown in Table 1.
With age as the block factor, the whole sample was randomly grouped into training set (N 1 = 426) and validation set (N 2 = 213) by 2:1. The basic information is also shown in Table 1. Moreover, hypothesis testing of differences between training and validation sets in terms of age, gender, etc., was performed to ensure the two subsets were balanced (see Additional file 1: Table S2). The test results showed no significant differences between the two subsets in terms of demographics (e.g., gender, age, BMI) and scale scores (P > 0.05).

Characteristics of the two instruments
The mean EQ-5D-3L utility index was 0.9225 [standard deviation (SD) = 0.1458] with the minimum and the maximum value being 0.056, 1.000 respectively. The mean TCM-HSS score was 3.4144 (SD = 3.1154) with the minimum and the maximum value being 0, 19.4333 respectively. The average scores of the eight subscales in TCM-HSS were all less than 1, with the highest in energy subscale (mean = 0.6174; SD = 0.5175) and the lowest in urination subscale (mean = 0.1758; SD = 0.3115). More specific information including the training and validation set was shown in Table 2. The Spearman's rank correlation coefficients between TCM-HSS and EQ-5D-3L were all moderate to strong (P < 0.05), indicating that there was some connection and mapping algorithm were fitted between the two scales. The correlation results are displayed in Additional file 1: Table S3-S4 for details.

Comparison of model performance
Each regression method included four model specifications, thus, a total of 28 candidate models were developed based on the training set. Table 3 shows the specific predictive performance of candidate models and the one with the best predicted performance has been bolded. Based on the validation set, TPM4 was the best performing model followed by ALDVMM4, and GLM2 or GLM4 was inferior. TPM4 was a regression method using a two-part model with eight subscale scores in TCM-HSS and covariates (age and gender) as independent variables in line with comprehensive ranking of the three indicators. The MAE, RMSE, and Spearman's rank correlation coefficient between the observed and predicted EQ-5D-3L values in the TPM4 model were 0.0554, 0.0974, and 0.6550, respectively. Besides, the 28 alternative models to predict overestimation or underestimation of EQ-5D-3L utility values were also computed based on the validation set.
The results are detailed in Additional file 1: Table S5-S6.

Establishment and evaluation of the final mapping algorithm
The coefficients of variables for the final mapping algorithm were obtained from the whole sample. Square terms and interaction terms, perhaps to improve performance in the final mapping algorithm, were taken into account as in other studies [32,33]. Thus, we took four steps to make adjustments for the final mapping model as follows. (1) A two-part regression model with the TCM-HSS eight subscales scores as independent variables was performed based on the total sample; (2) The square terms (i.e., EnT*EnT, PaT*PaT, etc.), as new independent variables based on step 1, were joined and eliminated if not significant (P > 0.05); (3) The two-way interaction terms (i.e., EnT*PaT, EnT*DiT, etc.), as new independent variables based on step 2, were joined and excluded if not significant (P > 0.05); (4) Age and gender were added as new demographic independent variables based on step 3. We presented the fitting results of each step in Additional file 1: Table S7 and found that the regression model obtained in step 4 was the top    Table 4.
In the final mapping algorithm, the prediction range is 0.2567-0.9965, and the observation range is 0.056-1.000. Nonetheless, both of them had the highest proportion in the fractional segment of (0.80, 1]. The Spearman rank sum correlation test was performed between the two, and the result was 0.6479 (P < 0.0001, P < 0.05), showing a medium-high strength correlation. We also drew a scatterplot between observed and predicted values (see Additional file 1: Fig. 1). Foreign researchers found that MAE, RMSE and mean error (ME) among the published mapping algorithms were at [0.0011, 0.19], [0.084, 0.2] and [0.0007, 0.042], respectively [16]. In our study, the ratio of AE > 0.05, ME, MAE, RMSE, and Spearman's rank correlation coefficient between the observed and predicted EQ-5D-3L values were 34.12%, 0.0003, 0.0566, 0.1039, and 0.6479, respectively (Table 5), which are within the above-mentioned scope and are at a high level. Furthermore, MAE, RMSE, and ME were calculated in different fractions (Table 5), and a line chart of the above indicators was plotted (see Additional file 1: Fig. 2).

Discussion
In this study, 28 candidate models were built by direct and indirect mapping methods first, and then the preferred model was selected based on MAE, RMSE and correlation coefficient between the observed and predicted EQ-5D-3L values. Afterwards, we adjusted    Table S7). Generally, the mapping algorithms tend to overestimate poorer health status and underestimate better health status [34]. We do the same, the predicted values were all below 1 but very close to 1 for the observed utilities at 1, however, the predictions were overestimated for the observed utilities below 0.6. More specific information is presented in Additional file 1: Table S8 and Fig. 3. Nevertheless, compared with the previous mapping algorithms, MAE, RMSE and correlation in our algorithm were superior, indicating good prediction accuracy. The empirical research based on the mapping method in China is still in its infancy now. We retrieved Chinese databases (CNKI, Wan Fang and VIP), with keywords such as "mapping", "utility", "health utility", and "quality of life scale", and found that there were 17 Chinese literatures related to the mapping method in measuring utility, of which only six were empirical literatures [35][36][37][38][39][40]. Merely one of the six studies performed both direct and indirect mapping methods (a total of seven regression methods) to map EORTC QLQ-BR53 to EQ-5D-5L and SF-6D respectively [38]; while the rest conducted exclusively direct mapping method, with few methodologies.
However, the foreign empirical researches based on the mapping method to obtain utility have reached a mature stage, involving various measures and regression methods. Mukuria [17] reviewed 180 empirical mapping literatures with seven PBMs like EQ-5D as the target scales from January 2007 to October 2018, including 233 mapping algorithms and involving numerous methods like two-part/three-part models, mixed models, fractional response models, adjusted limited dependent variable mixture model, etc. Notwithstanding, the mapping algorithm between TCM measures and PBMs is still blank regardless of the empirical mapping research at home and abroad. The mapping algorithm presented in this study makes up for the gap, and provides a new proposal for acquiring utility value of the TCM measures.
Although the recommended mapping algorithm, showing good predictive performance, provided an approach for acquisition of utility values, there were several potential limitations in our study. First, the external validity was relatively insufficient, that is, performance evaluation of the developed mapping algorithm was still based on the whole sample; therefore, external validity needs to be assessed by utilizing external independent datasets in the future. Second, the EQ-5D-3L dataset, with inherent ceiling effect evidently, were dramatically right skewed, which led to bias and uncertainty probably; thus, a mapping algorithm between EQ-5D-5L and TCM-HSS can be explored in the future to improve the ceiling effect of EQ-5D-3L. Finally, our data were derived from single-center cross-sectional questionnaire, and it is currently unclear whether the final mapping algorithm will be affected over time; hence, a large-sample, multi-regional, and multi-center dataset can be collected in the future, and we could attempt to investigate the relationship between the mapping algorithm and time factor based on longitudinal data.

Conclusion
It's the first practice to develop a mapping algorithm from TCM-HSS onto EQ-5D-3L based on the direct and response mapping approaches, which exhibits excellent predictive accuracy. Thus, the utility values can be obtained from the TCM quality of life measures when EQ-5D data is not available, which supplies technical support and improve the evidence level for economic evaluation of TCM.